library(here)
source(here::here("code/scripts/source.R"))
slides_dir = here::here("docs/slides/L02")
When I work with Bayesian models, we work with random numbers drawn from the posterior distribution. That’s nice, because you can easily summarise the sample. And you can make inferences from the sample. Cognitively a prosthetic because it helps us transform hard calculus problems into easy data summary problems.
One line in R is sufficient to do the sampling. p are the possibilities of our grid. We’ll get a big bag of numbers, and they’ll be in the same distribution as our posterior. When you use Markov chains, they only spit out samples.
What might you want to compute?
Commonly people want to construct intervals.
Two general kinds of intervals. One is an interval of defined boundaries. Upper left is the probability that less than half the world is covered by water. Compute by counting the number of samples that satisfy the criteria, then dividing by the total number of samples. Upper right is the probability between 50% and 70%. Lower right - there’s an infinite number of 80% intervals.
Two basic kinds of specified mass intervals. PI gives you the central area, where .25 is left over in each tail. They’re not necessarily the right thing to use. What if you have an asymmetric distribution? Now the 50 percentile interval omits the highest value. Use the HPDI to keep the highest point. But remember these are just summaries.
We care about uncertainty, and we want to summarise that. To use a point estimate, you need to provide a cost-benefit analysis. e.g. conservation or forecasting.
You can’t have confidence in an interval. It’s doublespeak. Compatibility emphasises the uncertainty. Credibility is the next conversation.
We’ve got the model and now we want to know what it expects. So we get it to simulate predictions.
Let’s consider three values from it. If we took the true value A and simlulated a bunch of globe tosses, what would the sampling distribution look like?
If it were B instead, it would centre around 6.
We want a posterior predictive distribution which mixes all these together in proportion to the posterior probability of each value of \(p\). We want the actual predictions of the model are not any one of these sampling distributions, they’re all of them mixed together in the proper weights to the improbable weights of \(p\) are given little weight and vice versa.
The probabilities come from the samples from the posterior distribution.